205 research outputs found

    Des modèles biologiques à l'amélioration des plantes

    Get PDF

    Superstrings with multiplicities

    Get PDF
    A superstring of a set of words P = s1, · · · , sp is a string that contains each word of P as substring. Given P, the well known Shortest Linear Superstring problem (SLS), asks for a shortest superstring of P. In a variant of SLS, called Multi-SLS, each word si comes with an integer m(i), its multiplicity, that sets a constraint on its number of occurrences, and the goal is to find a shortest superstring that contains at least m(i) occurrences of si. Multi-SLS generalizes SLS and is obviously as hard to solve, but it has been studied only in special cases (with words of length 2 or with a fixed number of words). The approximability of Multi-SLS in the general case remains open. Here, we study the approximability of Multi-SLS and that of the companion problem Multi-SCCS, which asks for a shortest cyclic cover instead of shortest superstring. First, we investigate the approximation of a greedy algorithm for maximizing the compression offered by a superstring or by a cyclic cover: the approximation ratio is 1/2 for Multi-SLS and 1 for Multi-SCCS. Then, we exhibit a linear time approximation algorithm, Concat-Greedy, and show it achieves a ratio of 4 regarding the superstring length. This demonstrates that for both measures Multi-SLS belongs to the class of APX problems. © 2018 Yoshifumi Sakai; licensed under Creative Commons License CC-BY.Peer reviewe

    Detection of recombination in variable number tandem repeat sequences

    Get PDF
    Tandem repeats are repeated sequences whose copies are adjacent along the chromosomes. They account for large portion of eukaryotic genomes and are found in all types of living organisms. Among tandem repeats, those with repeat unit of middle size are called minisatellites. These loci depart from classical loci because of the propensity to vary in size due to the addition or the removal of one or more repeat units. Due to this polymorphism, they prove useful in genetic mapping, in population genetics, and forensic medicine. Moreover, some specific tandem repeat loci are involved in diseases, like the insulin minisatellite, which is implicated in type I diabetes and obesity. Those loci also undergo complex recombination events. Presently, some programs to compare tandem repeats alleles exist and yield good results when recombination is absent, but none correctly handles recombinant alleles. Our goal is to develop an adequate tool for the detection of recombinant among a set of minisatellite sequences. By combining a multiple alignment tool and a method based on phylogenetic profiling, we design a first solution, called MS_PhylPro, for this task. The method has been implemented, tested on real data sets from the insulin minisatellite, and proven to detect recombinant allele

    Convergence of the number of period sets in strings

    Get PDF
    Consider words of length n. The set of all periods of a word of length n is a subset of {0,1,2,…,n−1}. However, any subset of {0,1,2,…,n−1} is not necessarily a valid set of periods. In a seminal paper in 1981, Guibas and Odlyzko have proposed to encode the set of periods of a word into an n long binary string, called an autocorrelation, where a one at position i denotes a period of i. They considered the question of recognizing a valid period set, and also studied the number of valid period sets for length n, denoted κ_n. They conjectured that ln(κ_n) asymptotically converges to a constant times ln^2(n). If improved lower bounds for ln(κ_n)/ln^2(n) were proposed in 2001, the question of a tight upper bound has remained opened since Guibas and Odlyzko's paper. Here, we exhibit an upper bound for this fraction, which implies its convergence and closes this long standing conjecture. Moreover, we extend our result to find similar bounds for the number of correlations: a generalization of autocorrelations which encodes the overlaps between two strings

    CpG-ODN-induced sustained expression of BTLA mediating selective inhibition of human B cells.

    Get PDF
    BTLA (B- and T-lymphocyte attenuator) is a prominent co-receptor that is structurally and functionally related to CTLA-4 and PD-1. In T cells, BTLA inhibits TCR-mediated activation. In B cells, roles and functions of BTLA are still poorly understood and have never been studied in the context of B cells activated by CpG via TLR9. In this study, we evaluated the expression of BTLA depending on activation and differentiation of human B cell subsets in peripheral blood and lymph nodes. Stimulation with CpG upregulated BTLA, but not its ligand: herpes virus entry mediator (HVEM), on B cells in vitro and sustained its expression in vivo in melanoma patients after vaccination. Upon ligation with HVEM, BTLA inhibited CpG-mediated B cell functions (proliferation, cytokine production, and upregulation of co-stimulatory molecules), which was reversed by blocking BTLA/HVEM interactions. Interestingly, chemokine secretion (IL-8 and MIP1β) was not affected by BTLA/HVEM ligation, suggesting that BTLA-mediated inhibition is selective for some but not all B cell functions. We conclude that BTLA is an important immune checkpoint for B cells, as similarly known for T cells

    ProbCD: enrichment analysis accounting for categorization uncertainty

    Get PDF
    As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test. We developed an open-source R package to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for
the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: http://xerad.systemsbiology.net/ProbCD/. We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation

    Neural Modeling and Control of Diesel Engine with Pollution Constraints

    Full text link
    The paper describes a neural approach for modelling and control of a turbocharged Diesel engine. A neural model, whose structure is mainly based on some physical equations describing the engine behaviour, is built for the rotation speed and the exhaust gas opacity. The model is composed of three interconnected neural submodels, each of them constituting a nonlinear multi-input single-output error model. The structural identification and the parameter estimation from data gathered on a real engine are described. The neural direct model is then used to determine a neural controller of the engine, in a specialized training scheme minimising a multivariable criterion. Simulations show the effect of the pollution constraint weighting on a trajectory tracking of the engine speed. Neural networks, which are flexible and parsimonious nonlinear black-box models, with universal approximation capabilities, can accurately describe or control complex nonlinear systems, with little a priori theoretical knowledge. The presented work extends optimal neuro-control to the multivariable case and shows the flexibility of neural optimisers. Considering the preliminary results, it appears that neural networks can be used as embedded models for engine control, to satisfy the more and more restricting pollutant emission legislation. Particularly, they are able to model nonlinear dynamics and outperform during transients the control schemes based on static mappings.Comment: 15 page

    Detecting microsatellites within genomes: significant variation among algorithms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker).</p> <p>Results</p> <p>Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (<it>Saccharomyces cerevisiae</it>, <it>Neurospora crassa </it>and <it>Drosophila melanogaster</it>) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif.</p> <p>Conclusion</p> <p>Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions.</p

    String Matching and 1d Lattice Gases

    Full text link
    We calculate the probability distributions for the number of occurrences nn of a given ll letter word in a random string of kk letters. Analytical expressions for the distribution are known for the asymptotic regimes (i) krl1k \gg r^l \gg 1 (Gaussian) and k,lk,l \to \infty such that k/rlk/r^l is finite (Compound Poisson). However, it is known that these distributions do now work well in the intermediate regime krl1k \gtrsim r^l \gtrsim 1. We show that the problem of calculating the string matching probability can be cast into a determining the configurational partition function of a 1d lattice gas with interacting particles so that the matching probability becomes the grand-partition sum of the lattice gas, with the number of particles corresponding to the number of matches. We perform a virial expansion of the effective equation of state and obtain the probability distribution. Our result reproduces the behavior of the distribution in all regimes. We are also able to show analytically how the limiting distributions arise. Our analysis builds on the fact that the effective interactions between the particles consist of a relatively strong core of size ll, the word length, followed by a weak, exponentially decaying tail. We find that the asymptotic regimes correspond to the case where the tail of the interactions can be neglected, while in the intermediate regime they need to be kept in the analysis. Our results are readily generalized to the case where the random strings are generated by more complicated stochastic processes such as a non-uniform letter probability distribution or Markov chains. We show that in these cases the tails of the effective interactions can be made even more dominant rendering thus the asymptotic approximations less accurate in such a regime.Comment: 44 pages and 8 figures. Major revision of previous version. The lattice gas analogy has been worked out in full, including virial expansion and equation of state. This constitutes the main part of the paper now. Connections with existing work is made and references should be up to date now. To be submitted for publicatio
    corecore